Skip to content

[NFC] Use pascal-style string storage for IString/Name#8662

Open
kripken wants to merge 22 commits intoWebAssembly:mainfrom
kripken:pascal
Open

[NFC] Use pascal-style string storage for IString/Name#8662
kripken wants to merge 22 commits intoWebAssembly:mainfrom
kripken:pascal

Conversation

@kripken
Copy link
Copy Markdown
Member

@kripken kripken commented Apr 30, 2026

Previously our interned strings were std::string_view, which means a
pointer and a size. Instead, store the size as a header alongside the
string data. As we have many views on the same data, this reduces
memory usage, basically anywhere we use a Name, which is every
Call, GlobalSet, Load, etc.

I see a 5% RAM reduction, and a small improvement in instructions
and branches. Wall time improves by around 1% though that is
close to the noise.

This removes the "reuse" optimization where we could reuse a
string from the input. That was error-prone and a bad idea anyhow.
In practice, it might have helped a little, but this new model is simpler
and saves a lot more. (We can't reuse now since we need to convert
to pascal-style storage anyhow.)

@kripken kripken requested a review from tlively April 30, 2026 16:46
@kripken kripken requested a review from a team as a code owner April 30, 2026 16:46
Comment thread src/support/istring.h Outdated
Comment thread src/support/istring.cpp
Comment on lines +26 to +28
if (s.data() == nullptr) {
return nullptr;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is not valid for empty string views to have nullptr data pointers?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid, though we'd need to be careful with hashing etc. Anyhow, this empty string does not need any storage, so it seems simpler to just skip it (might also be faster but I didn't measure).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if someone tries to intern the string_view {nullptr, 0} and we create a View at nullptr, then trying to turn that View back into a string_view will segfault. That seems bad, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't a std::string_view be null (internal pointer is nullptr)? Where would it segfault?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where it tries to dereference the pointer in the View (actually 4 bytes before it) to get the size.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, View needs to special-case a null to avoid a read. I also see that passing nullptr, 0 is not defined behavior anyhow. Fixed.

Comment thread src/support/istring.cpp
Comment thread src/support/istring.h Outdated
Comment thread src/support/istring.h Outdated
Comment thread src/support/istring.cpp Outdated
Comment thread src/support/istring.cpp
kripken added 3 commits May 4, 2026 16:18
Merge remote-tracking branch 'origin/main' into pascal
Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification!

Comment thread src/support/istring.h Outdated
Comment thread src/support/istring.h
char operator[](int x) const { return str[x]; }

explicit operator bool() const { return str.data() != nullptr; }
explicit operator bool() const { return str.internal != nullptr; }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interned empty string views can have str.internal == nullptr, but now that is interpreted as the absence of a name rather than the presence of an empty name. I don't think this is the intended behavior.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty name is non-null internal pointing to some place, and that place has a size of 0? I think this should be ok. Pushed a unit test now, did I do something wrong there?

Comment thread src/support/istring.h Outdated
Comment thread test/gtest/istring.cpp
Comment on lines +24 to +26
auto null = IString();
auto empty = IString("");
EXPECT_NE(null, empty);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need this:

auto default = IString(std::string_view{});

This should be the same as empty, but I think it will currently be the same as null.

Copy link
Copy Markdown
Member Author

@kripken kripken May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, why should that be empty? I believe the default std::string_view is a null. The test I just added confirms that unless I'm confused.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well std::string_view{} == std::string_view("") holds, so the interned versions of those should be the same as well.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, supporting that would add overhead. Do you think it's worth it?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any extra overhead except the first time we intern the empty stringref. We just need to remove the early return I commented on here and let the default empty string_view get interned like any other string_view.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That almost works but we also need to handle the case of comparing that interned string to an IString(), which never had anything interned for it, and just contains nullptr. That requires either

  1. Making IString() (the default value throughout the codebase) call interned() to intern.
  2. Make operator== return "equal" if both sizes are 0, no matter what data they point to.

I'd rather not add either overhead. I don't think we must follow std::string_view's API. (Instead, we would be following other languages that do differentiate the null string from the empty string, so it's not like I'm suggesting some chaotic behavior 😄 )

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the interned empty string_view won't have nullptr data, so it won't be confused with a default IString, which will have nullptr data, right? It's expected that IString() is not equal to IString(std::string_view{}) because the latter should be equal to IString("").

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe I misunderstood you before then. When you said

Well std::string_view{} == std::string_view("") holds, so the interned versions of those should be the same as well.

Did you mean "IF we interned the null string it should compare equal to the empty string (BUT we don't so things are ok as is?)"

If not, then I am confused. What concrete change to the API are you asking for?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that I added a test early in this thread - is that test wrong in your opinion? What should the test actually check at the API level?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect these test expectations to pass both before and after this refactoring:

ASSERT_EQ(IString(""), IString(std::string_view{})); // !
ASSERT_NE(IString(), IString(""));
ASSERT_NE(IString(), IString(std::string_view{})); // !
ASSERT_FALSE(IString().is());
ASSERT_TRUE(IString("").is());
ASSERT_TRUE(IString(std::string_view{}).is()); // !

My understanding of the current change is that the assertions involving the default string_view will fail. Since this change is supposed to be NFC, this is a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants